-
Notifications
You must be signed in to change notification settings - Fork 490
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
8085 decoupling tsv #8086
8085 decoupling tsv #8086
Conversation
- Use the Univocity annotations on the model - Add proper validation restrictions - Make the column headers (or future mappings) part of the model by adding a proper enum, representing the (TSV column) order and the key values
b516482
to
82b87f6
Compare
- Make column headers part of the model class in a reusable fashion by using an enum plus constants. The enum also represent order and key values for the parser. - Add Univocity parsing annotations to setter methods of the model class - Add validation annotations where necessary - Add tests for everything
4d9c044
to
fcb0dbd
Compare
…QSS#8085 - Add headers enum like with MetadataBlock and DatasetFieldType - Add a placeholder for alternate values that need references to the dataset field this CVV is a part of - Make the alternatives come from a column with a header (This is undocumented behaviour in the docs currently!) - Proper validation as with the other data bindings - Includes lot's of tests to make sure we notice when it breaks.
I am amazed by this work. (For Hacktoberfest I was playing with creating unit tests for the DatasetField and -Type classes, but they were not as extensive yet.) If I may ask, are you not creating a strong coupling between the "TSV"s and the core data model relating to fields? What if managing fields and field types should be editable/createable individually via the API, or controlled vocabularies updated using other formats like SKOS in Turtle/JSON-LD/...? Would the suggested approach prevent those possibilities? I understand that this is not currently worked on, so I don't expect my comment to influence anything soon. I was just wondering :) |
Hey @bencomp thanks for the 🌹 🌷 💮 ! I really had to look what I did there. And I have to say: I kind of gave up on this, as I also saw (some of) the problems you mentioned. The biggest obstacle, however, was the inability to reuse the TSV parser library with what we have as our custom metadata block format. I initially created this to iron out the problems with the TSVs and Solr configuration. So my other approach, which I hope to continue at some point, can be found here: #8320 I started to write a validating TSV-format parser in there, as I also found there is no good and complete spec of our format. |
What this PR does / why we need it:
Testing, reuse for tests and future features with TSV based metadata schema files need abstraction and parsing outside the API code, where currently the conversion happens.
This PR is about to change the underlying infrastructure for this, add lots of testing and implement actual checks on restrictions like no circular deps of fields, max depth of compounds etc etc etc.
This is a blocker ⚡ for #5989
Which issue(s) this PR closes:
Closes #8085
Special notes for your reviewer:
None yet.
Suggestions on how to test this:
Extensive unit and integration testing will be provided.
Does this PR introduce a user interface change? If mockups are available, please link/include them here:
Nope.
Is there a release notes update needed for this change?:
I don't think so, as it's part of the code infrastructure.
Additional documentation:
None yet. (Might make the docs on metadata blocks more precise in a few places)